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Abstract 


DP  UP  is  a  library  of  utilities  that  support  distributed  concurrent  computing  on  a  local 
area  network  of  computers.  The  library  is  built  upon  the  interprocess  communication  facilities 
in  Berkeley  Unix  4.2bsd.  Thus  it  will  run  on  any  network,  connected  by  an  Ethernet,  where 
each  computer  runs  a  version  of  the  Unix  operating  system  that  supports  the  Berkeley  Unix 
interprocess  communication  facilities.  DPUP  supports  two  models  of  distributed  concurrent 
computation,  a  master-slave  model  based  upon  stream  sockets,  and  a  broadcast  model  based 
upon  datagram  sockets.  With  each  model,  facilities  for  creating  and  terminating  remote 
processes,  establishing  communications  between  them,  and  sending  and  receiving  data  between 
these  processes  are  provided.  This  paper  describes  the  facilities  provided  in  DPUP  and  gives 
examples  of  their  use. 


1.  Introduction 


This  paper  describes  a  library  of  utilities  that  support  distributed  concurrent  computing 
on  a  local  area  network  of  computers.  The  library  is  built  upon  the  interprocess  communica¬ 
tion  facilities  in  Berkeley  Unix  4.2bsd.  Thus  it  will  run  on  any  network  of  computers  where 
each  machine  runs  a  version  of  the  Unix  operating  system  that  supports  the  Berkeley  Unix 
interprocess  communication  facilities.  The  library  is  called  DPUP  for  Distributed  Processing 
Utilities  Package.  It  is  written  in  C  and  can  be  used  by  C  or  FORTRAN  applications  pro¬ 
grams. 

The  purpose  of  DPUP  is  to  make  it  easier  to  use  a  local  area  network  of  computers  as  a 
loosely  coupled  multiprocessor.  It  is  clear  that  networks  of  computers,  especially  computer 
workstations,  are  becoming  an  increasingly  common  computing  environment  in  industry  and 
research  laboratories.  It  is  inevitable  that  some  users  of  these  networks  will  want  to  utilize  a 
number  of  computers  simultaneously,  as  a  loosely  coupled  multiprocessor,  to  solve  a  single 
problem.  This  may  be  especially  appropriate  during  non-peak  hours  when  many  machines  are 
idle  and  large  jobs,  such  as  number  crunching,  need  to  be  run. 

The  premise  for  the  use  of  a  network  of  computers  as  a  loosely  coupled  multiprocessor  is 
that  important,  expensive  problems  can  make  effective  use  of  this  parallel  computing  environ¬ 
ment.  This  appears  to  be  the  case.  In  particular,  it  appears  that  many  important  problems  in 
numerical  computation  can  be  effectively  solved  by  coarse  grain  parallel  algorithms  that 
predominately  involve  independent  concurrent  processing  and  require  only  a  small  amount  of 
interprocess  communication,  shared  data,  and  process  creation  and  termination.  Examples  ot 
such  problems  include  problems  from  optimization,  VLSI  design,  and  differential  equations  (see 
e.g.  Feijoo  and  Meyer  ;  1984] ,  McBryan  and  Van  de  Velde  [1985],  Schnabel  [1985],  Seitz  [1985]). 
Thus,  many  problems  appear  well  suited  for  concurrent  implementation  in  a  loosely  coupled 
parallel  computing  environment,  such  as  a  network  of  computers,  where  interprocess  communi¬ 
cation  is  considerably  slower  than  each  processor's  computation  speed. 
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In  order  to  effectively  use  a  network  of  computers  as  a  loosely  coupled  multiprocessor,  it 
is  necessary  to  have  support  for  interprocess  communication,  and  software  that  makes  distri¬ 
buted,  concurrent  programs  easy  to  write.  Several  major  projects,  all  initiated  before  the 
release  of  Berkeley  Unix  4.2,  have  provided  this  sort  of  support.  They  include  the  Crystal  pro¬ 
ject  at  the  University  of  Wisconsin  (Cook  et  al  11983a,  bl),  the  Locus  project  at  UCLA  (Popek 
et  al  11981],  Walker  et  al  [1983]),  the  Eden  project  at  the  University  of  Washington  (Lazowska 
et  al  [1981;,  Aimes  et  al  [1985]),  the  Spice  project  at  Carnegie-Mellon  University,  and  several 
projects  at  Xerox  PARC  (Birred  et  al  [1981!,  Shoch  and  Hupp  [1982]).  In  contrast,  DPUP  is  a 
far  simpler  system  that  builds  upon  the  interprocess  communication  primitives  in  Berkeley 
Unix  4.2. 

The  4.2  release  of  the  Berkeley  Unix  operating  system  was  the  first  major  operating  sys¬ 
tem  to  provide  networking  support  for  commercially  available  hardware  devices.  It  includes  a 
library  of  low  level  interprocess  communication  primitives  and  a  kernel  implmentation  of  the 
TCP/IP  protocol.  The  fundamental  abstracton  provided  is  the  socket,  a  generalization  of  the 
Unix  pipe.  There  are  two  types  of  sockets:  the  stream  socket,  a  reliable  communication 
medium  between  two  processes  (on  the  same  or  different  machines)  based  upon  the  TCP  proto¬ 
col,  and  the  datagram  socket,  which  can  broadcast  to  an  arbitrary  number  of  processes,  using 
the  UDP  protocol.  The  stream  socket  provides  a  flow  controlled  byte  stream  which  is 
guaranteed  to  be  reliable,  in  that  data  will  be  delivered  without  error  or  duplication  in  the 
order  that  it  is  sent.  The  datagram  socket  is  not  guaranteed  to  be  reliable  although  practice 
has  shown  that  usually  it  is. 

To  write  distributed  concurrent  programs,  it  is  desirable  to  have  higher  level  operations 
than  the  ones  provided  in  Berkeley  Unix.  Obvious  desirable  features  include  the  ability  to 
easily  create  remote  processes  and  establish  communication  paths  between  them,  the  ability  to 
signal  or  kill  remote  processes,  and  the  ability  to  send  and  receive  data  between  processes 
easily.  More  sophisticated  features  would  include  update  protocols  on  data  that  is  sent 
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between  processes,  and  features  that  facilitate  the  debugging  and  testing  of  distributed  con¬ 
current  programs. 

The  DP  UP  system  provides  such  facilities  at  a  fairly  basic  level.  In  conjunction,  it  pro¬ 
vides  two  models  of  distributed  computation.  These  are  a  master/slave  model  based  upon 
stream  sockets,  and  a  broadcast  model  based  upon  datagram  sockets.  In  both  models,  basic 
operations  for  initiating  and  terminating  remote  processes,  and  sending  and  receiving  data 
between  these  processes,  are  supported.  Some  simple  update  protocols  for  communicated  data 
are  provided.  Only  very  rudimentary  debugging  support  (beyond  the  standard  Unix  support 
such  as  DBX)  is  included. 

An  advantage  of  the  simple  approach  taken  in  DPUP  is  that  the  system  is  portable  to 
any  Berkeley  Unix  4.2  (or  equivalent)  environment.  At  the  University  of  Colorado,  DPUP  has 
been  used  on  networks  of  Sun  workstations  (Sun-2’s  or  3’sj,  on  Vaxes,  on  Pyramids,  and 
between  various  combinations  of  these  machines.  The  concurrent  distributed  algorithms  that 
have  been  implemented  using  DPUP  are  in  areas  including  global  optimization  (Byrd  et  al 
[1986]),  discrete  optimization  (Trienekens  [1986]),  VLSI  design  (Moceyunas  [1986]),  and  solving 
systems  of  equations.  The  DPUP  system  also  has  been  ported  to  many  other  machines  includ¬ 
ing  Apollo,  Celerity,  Gould,  ISI,  Masscomp,  MicroVax,  and  Symmetric,  and  the  Sequent  mul¬ 
tiprocessor.  A  second,  more  sophisticated  system  built  at  the  University  of  Colorado,  which  is 
not  portable  because  it  includes  kernel  modifications  to  support  shared  memory,  is  described  in 
Harter  and  Maybee  [1985].  Other  systems  that  support  the  use  of  a  network  of  computers  for 
distributed  concurrent  computation  include  Carriero  and  Gelernter  [1986],  Cooper  [1982],  Su  et 
al  [1985],  and  Theimer  et  al  [1985]. 

The  remainder  of  this  paper  describes  the  facilities  provided  in  DPUP  and  gives  examples 
of  their  use.  Section  2  describes  the  two  models  of  computation,  point  to  point  and  broadcast, 
supported  in  DPUP.  In  Sections  3  and  4  we  discuss  the  utilities  DPUP  provides  along  with  each 
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of  these  models  of  computation.  Section  5  briefly  summarizes  experience  at  the  University  of 
Colorado  using  DPUP  to  implement  and  test  distributed  concurrent  applications.  A  simple 
example  of  a  distributed  concurrent  computation  coded  in  both  the  point  to  point  and  broad¬ 
cast  modes  is  given  in  Appendix  A. 

Additional  information  about  the  DPUP  system,  including  detailed  descriptions  of  the 
DPUP  functions,  sample  programs,  and  the  DPUP  source  code,  is  available  from  the  authors. 

2.  Models  of  Computation 

The  DPUP  System  is  primarily  intended  to  support  two  models  of  distributed  concurrent 
computation.  The  first  is  a  master/slave  model  and  uses  point  to  point  communications  based 
upon  stream  sockets.  The  second  is  a  broadcast  model  based  upon  the  datagram  sockets.  This 
section  briefly  describes  these  two  models,  their  advantages  and  disadvantages,  and  the  sys¬ 
tems  architectures  underlying  them.  While  it  is  possible  to  construct  other  distributed  compu¬ 
tation  environments  using  DPUP,  for  example  by  combining  the  point  to  point  and  broadcast 
facilities,  we  do  not  discuss  such  possibilities  in  any  detail. 

The  master/slave  model  is  a  simple  model  of  concurrent  computation.  The  computation 
is  organized  around  one  master  process  which  creates  an  almost  arbitrary  number  of  slaves. 
Each  slave  is  connected  (via  a  stream  socket)  to  the  master,  and  any  communication  between 
slaves  is  done  through  the  master.  Actually,  it  is  possible,  using  DPUP,  for  any  slave  process 
itself  to  act  as  a  submaster  and  create  its  slaves,  so  that  an  arbitrary  tree  structure  is  possi¬ 
ble.  The  distributed  applications  that  have  used  DPUP  have  not  used  this  generality  and  it  is 
not  discussed  further  here. 

A  high  level  diagram  of  the  architecture  underlying  the  DPUP  master/slave  model  is 


given  in  Figure  2.1. 
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Each  machine  that  is  a  part  of  the  distributed  computation  has  a  server  process,  called 
dp_server.  Slave  processes  are  created  by  the  master  as  children  of  the  dp_server  process  on 
the  remote  host.  Point  to  point  communication  paths  then  are  created  directly  between  the 
master  process  and  the  slave,  so  that  subsequent  communication  bypasses  the  dp_server. 

The  master/slave  model  is  appropriate  for  many  distributed  concurrent  computations.  It 
is  simple  to  understand  and  use,  and  for  many  applications  the  centralized  control  mirrors  the 
natural  structure  of  the  parallel  algorithm.  All  the  distributed  applications  projects  at  the 
University  of  Colorado  mentioned  in  Section  1,  in  optimization,  VLSI  design,  and  other  areas, 
have  used  this  model. 

The  master/slave  model  has  several  disadvantages,  however.  The  main  disadvantage  is 
that  in  parallel  algorithms  where  direct  communication  between  slaves  (as  opposed  to  com¬ 
munication  between  a  slave  and  the  controlling  process)  is  involved,  requiring  all  communica¬ 
tion  to  go  through  the  master  is  unnecessary  and  may  create  a  bottleneck.  In  particular,  in 
applications  where  neighboring  processes  need  to  communicate,  or  in  applications  where  each 
process  needs  to  send  data  to  all  processes,  the  master/slave  model  may  be  inefficient.  Another 
disadvantage  is  that  various  operating  system  constraints  usually  limit  the  number  of  point  to 
point  connections  (file  descriptors)  per  process,  so  that  the  master/slave  model  doesn’t  scale  up 
indefinitely.  Finally,  using  the  master/slave  model  usually  results  in  a  certain  degree  of  syn¬ 
chronization  between  the  slaves  and  the  master,  which  inevitably  causes  the  slaves  to  be  idle 
at  times  in  an  environment  where  interprocessor  communication  is  slow.  For  some  parallel 
algorithms,  such  as  the  chaotic  relaxation  method  for  solving  systems  of  linear  equations  which 
is  used  as  our  example  in  Appendix  A,  this  synchronization  is  unnecessary  because  it  is  not  crit¬ 
ical  that  all  processes  have  up-to-date  values  of  all  distributed  shared  variables  at  all  times. 
Thus  using  the  master/slave  model  may  introduce  needless  inefficiencies  into  distributed  imple¬ 
mentation  of  such  algorithms. 
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The  broadcast  model  supported  by  DPUP  provides  an  alternative  to  the  master/slave 
model  that  is  especially  suited  to  loosely  coupled  asynchronous  algorithms.  The  model  assumes 
that  all  processes  are  equal,  with  no  master  process.  (It  often  is  convenient  to  have  a  control 
process  to  initiate  the  distributed  algorithm,  monitor  its  progress,  and  decide  when  to  ter¬ 
minate  it.  However  this  control  process  doesn’t  relay  messages  between  the  other  processes  as 
is  the  case  in  the  master/slave.)  In  the  simplest  use  of  the  broadcast  model,  all  processes  can 
be  thought  to  be  connected  on  one  common  communications  path,  and  whenever  one  process 
sends  a  message,  all  the  other  processes  hear  and  receive  it.  Actually,  it  is  possible  to  have 
communications  between  subgroups  of  processes  as  well. 

A  high  level  diagram  of  the  architecture  underlying  the  broadcast  model  is  given  in  Fig¬ 
ure  2.2.  Notice  that  each  computer  that  is  part  of  the  distributed  computation  now  has  two 
servers,  the  dp_server  used  in  the  master/slave  model  and  a  broadcast  server,  called  bc_server. 
The  dp_server  is  used  to  create  processes  and  help  connect  them  properly  to  the  broadcast  sys¬ 
tem.  The  broadcast  server  handles  all  the  broadcast  communication  for  that  node.  It  contains 
copies  of  ail  the  variables  that  are  shared  through  the  broadcast  system  (called  ’’broadcast 
variables");  the  application  processes  contain  their  own  copies  of  the  broadcast  variables  as 
well.  Whenever  data  is  broadcast,  it  is  received  by  the  bc_server  (and  not  directly  by  the 
applications  processes).  The  application  processes  then  query  their  bc_server  when  they  are 
interested  in  the  values  of  their  broadcast  variables.  When  broadcast  data  is  sent  by  an  appli¬ 
cation  process,  it  goes  directly  to  the  datagram  socket  (bypassing  its  own  bc_server),  to  be 
received  by  all  bc_servers  including  its  own.  There  are  facilities  in  the  broadcast  system  for 
multiple  applications  to  simultaneously  use  the  broadcast  system,  and  for  multiple  broadcast 
groups  to  exist  within  a  single  application.  These  are  described  in  more  detail  in  Section  4. 

When  using  the  broadcast  system,  it  is  possible  that  all  application  processes  may  not 
contain  up-to-date  values  of  all  broadcast  variables  at  any  given  time.  It  is  also  possible  that 
a  given  update  of  broadcast  variables  may  never  reach  one  or  more  bc_servers  (and  hence  ail 
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application  processes  on  their  computers)  due  either  to  the  unreliable  nature  of  the  datagram 
socket  or  to  the  prototypical  implementation  of  the  DPUP  broadcast  system.  In  particular, 
the  DPUP  broadcast  system  sometimes  discards  messages  if  several  messages  are  sent  in  close 
succession.  In  the  applications  for  which  the  broadcast  system  is  primarily  intended,  this 
should  not  be  a  problem.  The  main  application  of  the  broadcast  system  should  be  for  coarse 
grain  parallel  algorithms  where  interprocessor  communication  is  infrequent.  In  our  experience, 
such  algorithms  rarely  lose  broadcast  messages.  Furthermore,  many  such  applications  are 
iterative  and  if  a  message  is  lost,  this  does  not  seriously  affect  the  overall  efficiency  of  the  cal¬ 
culation.  The  iterative  calculation  proceeds  using  some  out  of  date  values,  and  eventually 
these  are  updated  by  newer  values  of  the  same  variables. 


3.  Point-to-Point  Facilities 


The  point  to  point  facilities  in  DPUP  establish  direct  communication  paths  between  pairs 
of  processes  on  distinct  (or  the  same)  computers.  Generally,  they  are  used  to  organize  a  distri¬ 
buted  computation  in  a  master/slave  hierarchy  as  discussed  in  Section  2.  In  addition,  slave 
processes  may  act  as  submasters  and  spawn  their  own  slaves,  so  that  an  arbitrary  tree  struc¬ 
ture  is  possible.  In  fact,  it  is  possible  to  establish  communication  paths  between  any  pair  of 
processes,  thereby  enabling  the  implementation  of  any  communications  network. 


The  DPUP  point  to  point  functions  are: 


dp_create_proc 

dp_j'mt_setup 

dp_read 

dp__write 

dp_kill_proc 

dp_sig_proc 

dp_status 

dp_close_proc 

dp_rmt_exit 


create  a  process 

start  up  a  remote  process 

read  data 

send  data 

kill  a  process 

signal  a  process 

display  status  information 

close  a  socket  in  the  master  process 

exit  a  remote  process 
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This  section  outlines  the  capabilities  and  use  of  these  functions.  More  detailed  informa¬ 
tion  is  contained  in  the  Unix  manual  pages  for  the  DPUP  functions  which  are  available  from 
the  authors. 

3.1  Using  The  System 

Each  computer  in  the  distributed  computation  must  contain  a  server  process,  called 
dp_server.  This  server  is  used  by  the  master  process  to  create  processes  on  remote  machines, 
and  to  establish  communications  directly  between  the  master  and  slaves. 

Before  a  user  program  can  create  a  process  on  a  remote  host,  dp_server  must  be  started 
on  the  remote  host.  The  dp_server  uses  a  hardwired  service  port  number,  usually  assigned  by 
the  system  administrator  and  placed  in  the  / etc/ services  file  to  support  the  entire  DPUP 
library.  An  identical  DPUP  service  port  entry  should  exist  in  the  / etc / services  file  on  every 
machine  that  intends  to  run  the  dp_server.  This  server  should  be  started  in  the  background  on 
each  of  the  machines  to  be  used  by  a  distributed  concurrent  program.  This  is  usually  done  by 
adding  a  line  to  the  file  / etc/rc. local  that  is  executed  at  boot  time.  dp_server,  when  running,- 
listens  for  service  requests  from  user  programs  on  any  of  the  participating  machines. 

In  order  to  access  the  DPUP  data  structures,  the  applications  program  must  include  the 
file  dpup_user.fi  which  usually  resides  in  the  f usr f local/ include  directory.  In  addition,  the 
application  program  needs  to  be  linked  with  the  library  libdpup_Juns.a.  This  usually  resides  in 
/ usr/ local/ lib  and  is  accessed  by  adding  the  flag  -ldpup_funs  to  the  compile  command  line. 

3.2  Remote  Process  Creation 

Remote  processes  are  created  using  the  function  call: 

proc_fd  —  dp_create_proc(r_stat,  host,  proc,  args,  0); 

where  the  process  proc  with  arguments  args  is  started  on  host.  r_stat  is  a  pointer  to  a  data 
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structure  that  is  filled  by  the  create  function  with  status  information  about  the  process 
created.  The  dp_create_proc  function  call  returns  the  file  descriptor  of  a  bidirectional  pipe  for 
communication  with  the  remote  process.  In  the  case  of  an  error,  the  value  —1  is  returned.  An 
error  will  occur  if  the  dp_server  process  is  not  running  on  the  remote  host  or  if  the  file  proc 
does  not  exist  or  cannot  be  executed  on  the  remote  host. 

The  above  example  Is  the  simplest  form  of  the  dp_create_proc  function.  Actually  the 
arguments  host ,  proc ,  args  each  can  be  arrays  of  equal  size.  In  this  case  the  arrays  contain 
pointers  to  a  list  of  possible  hosts,  the  corresponding  process  to  start  on  each  host,  and  the 
arguments  to  pass  to  each  process.  The  call  then  determines  the  least  utilized  machine  among 
the  entries  in  the  host  array,  utilizing  the  Unix  load  average  information,  and  creates  the 
corresponding  process  from  the  array  proc  there.  This  primitive  form  of  load  balancing  can  be 
used,  if  the  number  of  processes  is  much  larger  than  the  number  oi  processors,  to  attempt  an 
equitable  scheduling  of  resources. 

3.3  Remote  Process  Startup 

Using  the  dp_create_proc  function  causes  a  remote  process  to  be  executed,  but  does  not 
connect  that  process  to  the  calling  program.  To  complete  this  connection  the  dp_rmt_setup 
function  is  used.  dp_rmt_setup  should  be  the  first  statement  executed  by  the  remote  process,  as 
follows: 

create_fd  =  dp_rmt_setup  (&argc,  argv) 

The  call  returns  a  file  descriptor  of  a  bidirectional  pipe  that  connects  to  the  creator  process. 
The  arguments  argc ,  and  argv  are  the  same  arguments  provided  to  a  main  C  program.  Thus 
the  remote  process  must  define  argc  and  argv  even  if  it  does  not  intend  to  use  them,  because 
dp_create__proc  uses  them  to  pass  along  the  process  identification  handle. 
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3.4  Data  Transfer 

Data  transfer  can  be  accomplished  in  a  number  of  ways.  In  addition  to  the  Berkeley  Unix 
system  calls: 

write(fd,  buf,  data_size); 
read(fd,  buf,  data_size); 
send(fd,  buf,  data_size,  flags); 
recv(fd,  buf,  data__size,  flags); 

the  DP  UP  functions  dp_write  and  dp_read  may  also  be  used: 

dp_write(fd,  buf,  data_size); 
dp_read(fd,  buf,  data_size,  block_fiag); 

The  advantages  of  the  DPUP  functions  over  the  system  functions  are  that  dp_read  allows  the 
user  to  specify  that  the  read  should  block  until  data_size  bytes  of  data  have  been  returned  or 
to  specify  that  dp_read  should  return  immediately  with  only  the  data  currently  available. 

Due  to  the  way  the  DPUP  data  transfer  functions  are  implemented,  it  is  not  possible  to 
send  data  with  a  system  function  and  receive  the  data  with  a  DPUP  function,  for  example,  it  is 
not  possible  to  use  dp_read  to  receive  data  sent  using  send. 

An  additional  method  of  data  transfer  is  by  way  of  the  broadcast  facilities.  These  are 
discussed  in  Section  4. 


3.5  Remote  Process  Signaling 

A  master  process  may  send  signals  to  remote  processes  using  the  function  call: 
dp_sig_proc(r_stat,  sig); 

This  function  sends  the  signal  sig  to  the  remote  process  identified  by  r__stat.  The  Junction  call: 
dp_kill_proc(r_stat); 

is  a  special  form  of  dp_sig_proc  that  sends  the  KILL  signal  to  the  remote  process  identified  by 
r_stat.  This  function  provides  the  master  process  a  convenient  method  of  terminating  remote 
processes  and  should  be  used  to  clean  up  after  a  distributed  program  is  finished  to  prevent 
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"zombie"  processes. 

3.6  Remote  Exit 

At  the  end  of  execution  the  remote  process  should  exit  using: 
dp_rmt_exit() 

This  function  will  make  a  clean  exit  of  the  remote  process,  ie.  sockets  and  file  descriptors  are 
closed  properly. 

3.7  Close  Master  Sockets 

It  sometimes  is  necessary  for  the  master  process  to  close  sockets  to  slaves  that  have 
exited,  in  order  not  to  exceed  the  system  imposeed  limit  on  file  descriptors. 

To  accomplish  this  the  master  calls: 
dp_close_sock(proc_fd) 

This  should  only  be  done  when  communication  with  the  remote  process  is  complete. 

3.8  Status  Information 

One  of  the  primary  issues  in  developing  distributed  programs  is  trying  to  diagnose  prob¬ 
lems  occurring  between  remote  processes.  When  used  in  the  standard  manner,  the  debuggers  in 
Berkeley  Unix  cannot  debug  child  processes  and  cannot  see  communication  packets  between 

communicating  processes.  For  this  reason,  a  DPUP  function  call: 
dp_status(); 

has  been  included. 

dp_status  is  used  to  provide  information  relating  to  communications  between  distributed 
processes  using  the  DPUP  functions.  This  call  is  mainly  intended  to  provide  status  information, 
such  as  a  systems  call  error  or  a  DPUP  function  error. 
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3.9  Input/Outpat  Multiplexing 

An  important  facility  in  developing  distributed  applications  is  the  ability  to  multiplex 

I/O  requests  among  multiple  remote  processes.  In  a  master/slave  application,  the  master  can 

use  the  Unix  select  system  call  to  determine  which  of  its  slaves  want  to  read  data,  to  write 

data  or  have  exceptional  conditions  pending  (EOF  for  example)  by  using  the  call: 
select(nfds,  &readfds,  &writefds,  &execptfds,  &timeout); 

nfds  is  the  range  of  file  descriptors,  that  is,  at  least  one  greater  (because  C  counts  from  0)  than 
the  highest  value  a  file  descriptor  can  assume. 

readfds,  writefds,  exceptfds  are  bit  masks  that  indicate  which  client  processes  (file  descrip¬ 
tors)  are  ready  to  read/write/take  exception.  File  descriptor  0  corresponds  to  a  1  in  the  least 
significant  or  0th  bit,  file  descriptor  1  to  the  first  bit  etc.  The  masks  are  usually  formed  by  a 
bitwise  OR  operation,  for  example  if  the  server  had  file  descriptors  3,  5,  6,  8  open  to  client 

processes,  the  mask  to  inquire  about  these  processes  would  be  (as  a  binary  number): 

...00101101000 

These  masks  are  usually  formed  in  an  applications  program  by  an  instruction  of  the  form  : 
mask  =  mask  J  (1  <<  fd); 

A  timeout  value  may  be  specified  if  the  select  is  not  to  wait  indefinitely  for  input/output 
requests.  If  timeout  is  set  to  0,  the  select  takes  the  form  of  a  poll,  returning  immediately;  if 
the  timeout  parameter  is  a  null  pointer,  the  selection  will  block  indefinitely1.  Select  normally 
returns  the  number  of  file  descriptors  selected.  If  the  select  call  returns  due  to  the  timeout 
expiring,  then  a  value  of  —1  is  returned  and  the  system  parameter  errno  is  set  to  EINTR. 


lTo  be  more  specific,  a  return  takes  place  only  when  a  descriptor  is  selectable,  or  when  a  signal  is  received  by  the  caller,  in¬ 
terrupting  the  system  call. 
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The  select  function  cannot  be  used  for  I/O  multiplexing  with  the  DPUP  broadcast  facili¬ 
ties. 


4.  Broadcast  Facilities 

The  broadcast  facilities  in  DPUP  provide  an  alternative  to  the  point-to-point  facilities  of 
the  previous  section.  As  discussed  in  Section  2,  the  point-to-point  facilities  do  not  scale  up  to 
an  arbitrary  number  of  processes  due  to  system  constraints.  In  addition,  the  master/slave 
model  that  the  point  to  point  facilities  generally  are  used  to  implement  may  not  be  appropri¬ 
ate  for  some  applications,  where  it  causes  a  bottleneck  at  the  master  or  leads  to  unnecessary 
synchronization.  To  alleviate  these  problems  and  provide  an  environment  especially  appropri¬ 
ate  for  asynchronous  concurrent  computations,  the  DPUP  broadcast  functions  were  developed. 

The  broadcast  functions  provide  an  interface  to  a  datagram  based  broadcast  system. 
Groups  of  communicating  processes  sharing  the  same  data  and  using  the  broadcast  facilities 
are  identified  as  broadcast  groups.  The  broadcast  system  supports  multiple  broadcast  groups 
at  one  time;  processes  may  be  members  of  more  than  one  broadcast  group,  or  more  commonly, 
several  distributed  applications  that  use  the  broadcast  facilities  can  run  on  the  same  system  at 
the  same  time,  each  members  of  a  separate  broadcast  group. 


The  DPUP  broadcast  functions  are: 


bc__open 
bc_close 
bc__create_grp 
bc_remove_grp 
be  _join_grp 
bc_resign_grp 
be  _send 
bc__receive 


open  a  connection  to  the  broadcast  server 

close  the  connection  to  the  server 

create  a  new  broadcast  group 

remove  a  group 

join  a  group 

resign  from  a  group 

broadcast  data 

receive  data 


A  brief  explanation  of  the  function  and  use  of  each  is  given  below.  For  a  more  detailed 
specification  of  a  function’s  arguments,  return  value,  or  error  diagnostics,  consult  the  specific 
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DPUP  manual  entry. 

4.1  Using  the  System 

Each  computer  using  the  broadcast  system  must  contain  two  server  processes,  the 
dp_server  discussed  in  the  previous  section  and  a  broadcast  server,  bc_server.  The  dp_server  is 
used  in  creating  processes  and  connecting  them  to  the  broadcast  system,  as  shown  in  Figure 
2.2.  The  broadcast  server  maintains  variables  which  function  as  segments  of  distributed 
shared  memory.  Each  segment  is  used  by  a  different  broadcast  group  to  communicate  with  its 
members.  As  a  segment  is  created,  the  number  ot  variables  and  the  length  of  each  variable  are 
specified.  Data  is  broadcast  by  sending  new  values  to  a  datagram  socket  which  is  read  by  the 
bc_server  processes  on  each  machine;  applications  processes  then  read  the  data  from  their  local 
bc_server  Updating  a  block  of  variables  is  not  an  atomic  operation,  (see  Sections  4.9  and  4.11) 
so  that  update  protocols  may  be  required.  The  current  update  strategy  is  to  overwrite  all 
variables  transmitted;  a  user  may  choose  to  implement  other  strategies.  The  applications  pro¬ 
gram  need  not  transmit  or  receive  all  variables  in  a  broadcast  group,  the  broadcast  variable 
flags  allow  this  to  be  controlled  by  the  user. 

Before  a  user  program  utilizes  the  broadcast  system,  the  dp_server  and  bc_server  must  be 
running  on  each  participating  machine.  This  is  usually  done  by  adding  a  line  to  the  file 
/ etc j rc. local  that  is  executed  at  boot  time.  A  service  port  must  also  exist  in  / etc / services  for 
both  servers.  This  is  described  in  more  detail  in  Section  3.1. 

In  order  to  access  the  DPUP  data  structures,  the  applications  programs  must  include  the 
file  dpup_user.h  which  in  most  installations  would  reside  in  the  directory  j usr / local/ include.  In 
addition,  when  an  applications  program  is  linked,  the  libraries  lihdp__funs.a  and  libbc_Juns.a 
must  be  included.  These  would  usually  reside  in  the  directory  j  usr  /  local/  lib  and  be  accessed  by 
adding  the  flags  -ldp_Juns  and  - lbc_Juns  to  the  compile  command  line. 
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4.2  Opening  the  Broadcast  System 

Before  a  process  can  make  use  of  the  DPUP  broadcast  system,  the  process  must  establish 
a  connection  to  the  local  broadcast  server.  This  is  accomplished  by  using  the  following  func¬ 
tion  call: 

status  =  bc_open(); 

The  return  value  status  is  0  if  the  call  was  successful;  -1  otherwise.  This  results  in  a  datagram 
socket  for  broadcasting  data  and  a  stream  socket  for  reading  data  from  the  server. 

4.3  Closing  the  Broadcast  System 

Before  an  application  program  that  has  opened  the  broadcast  system  terminates,  it 
should  close  the  system.  This  closes  open  socket  connections  and  is  important  since  the  total 
number  of  such  connections  to  the  bc_server  is  limited  by  the  operating  system.  This  is  accom¬ 
plished  by  using  the  function  call: 

status  =  bc_close(); 

function.  The  return  value  status  is  0  if  the  call  was  successful;  -1  otherwise. 

4.4  Creating  a  Broadcast  Group 

To  create  a  new  broadcast  group  in  the  DPUP  broadcast  system,  the  following  function 
call  is  used: 

status  =  bc_create_grp(bc_id,  var_table,  num_vars); 

This  function  requests  the  local  broadcast  server  to  set  up  a  new  broadcast  group  containing 
num_uars  broadcast  variables  whose  sizes  are  specified  in  variable.  A  group  identifier  will  be 
assigned  by  the  local  broadcast  server  and  returned  in  the  bc_id  argument.  The  return  value 
status  is  0  if  the  call  was  successful;  -L  otherwise.  The  local  broadcast  server  will  broadcast  the 
new  group’s  parameters  and  id  thus  informing  the  bc_servers  on  other  hosts  of  the  existence  ol 
the  new  group.  The  other  servers  then  create  their  own  copies  of  the  broadcast  variables. 
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4.5  Remote  Process  Creation 

After  an  initiating  process  has  created  the  broadcast  group,  remote  processes  may  be 
created.  This  is  done  using  the  point-to-point  function  call  dp_create_proc  described  in  Section 
3.2.  The  initiating  process  must  also  send  the  group  id  number  to  the  remote  process  using 
dp_write.  The  remote  process  must  begin  with  a  dp_rmt_setup,  as  described  in  Section  3.3,  and 
then  read  the  group  id  with  a  dp_read.  At  the  end  of  its  execution  the  remote  process  should 
exit  using  dp_rmt_exit. 

4.8  Joining  a  Broadcast  Group 

Once  a  broadcast  group  has  been  created,  processes  that  have  opened  the  broadcast  sys¬ 
tem  may  join  the  group  by  using  the  function  call: 

id_index  =  bc_join_grp(bc_id); 

(The  process  that  created  the  group  also  must  join  it).  This  function  informs  the  local  broad¬ 
cast  server  that  a  new  process  has  joined  the  broadcast  group  identified  by  bc_id.  The  function 
returns  a  broadcast  identifier  index  that  identifies  the  broadcast  group.  The  process  then  may 
transmit  and  receive  data  within  this  group.  Processes  may  be  members  of  multiple  broadcast 
groups  at  the  same  time. 

4.7  Resigning  from  a  Broadcast  Group 

A  process  may  resign  from  a  broadcast  group  identified  by  id_index  by  using  the  function 

call:' 

result  =  bc_resign_grp(id_index); 

Once  a  process  resigns  from  a  broadcast  group,  the  process  may  no  longer  transmit  or  receive 
broadcast  data  within  that  group  unless  the  process  once  again  joins  the  group. 
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If  a  process  is  the  last  member  of  a  group  to  resign,  and  if  another  process  has  a 
bc_remove_grp  command  pending,  the  group  will  be  removed  at  this  time. 

4.8  Removing  a  Broadcast  Group 

To  remove  a  broadcast  group  from  the  DPUP  broadcast  system,  the  following  function  is 

used: 

result  =  bc_remove_grp(bc_id); 

bc_id  identifies  the  broadcast  group  that  is  to  be  removed.  Currently  any  process  may  call  this 
function  (i.e.  there  is  no  concept  of  superuser  processes  in  the  broadcast  system  at  this  time). 
It  should  be  noted  that  removing  a  broadcast  group  while  the  group  is  still  active  will  be 
delayed  until  all  the  members  of  the  group  have  resigned  and  the  group  is  no  longer  active.  It 
is  important  that  non-active  groups  be  removed  so  that  the  broadcast  server  state  tables  do 
not  become  full  of  non-active  groups. 

4.9  Broadcasting  Data 

A  process  sends  broadcast  data  using  the  function: 

bytessent  =  bc_send(id_index,  var_flags,  data); 

This  function  broadcasts  variables  within  the  broadcast  group  specified  by  id_index.  var__flags 
is  an  array  used  to  specify  which  variables  in  this  group  are  being  transmitted.  A  variable  will 
be  transmitted  only  if  the  associated  var __flag  is  set  to  1.  All  broadcast  group  data  variables 
must  first  be  moved,  if  necessary,  to  a  contiguous  block  of  memory  specified  by  data.  The 
return  value  bytessent  is  the  number  of  bytes  sent  if  the  call  was  successful;  -1  otherwise. 

This  is  not  an  atomic  operation,  that  is,  the  data  is  sent  in  three  parts,  the  header,  the 
varflags,  and  the  actual  variables.  Therefore,  the  data  sent  by  one  process  may  be  interleaved 
with  data  from  other  processes.  In  the  current  implementation,  once  the  header  packet  is 
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received,  the  server  aw  its  the  two  remaining  parts  and  discards  any  data  received  from  other 
processes  until  the  transaction  is  completed  or  a  time  limit  is  reached.  Thus,  it  is  possible  that 
some  broadcast  information  will  never  be  received;  this  is  more  likely  the  more  frequently 
broadcast  data  is  sent.  It  is  not  possible,  in  the  current  implementation,  to  notify  other  send¬ 
ing  processes  that  their  data  may  have  been  discarded. 

4.10  Receiving  Broadcast  Data 

A  process  receives  broadcast  data  using  the  function: 

bytesrcvd  =  bc_receive(id_index,  mode,  data,  var_flags); 

This  function  is  used  to  read  broadcast  data  variables  associated  with  the  broadcast  group 
identified  by  id_inde x  that  have  been  received  by  the  local  broadcast  server.  The  function 
argument  mode  indicates  which  variables  are  to  be  read.  The  following  modes  are  currently 
implemented: 

mode  description 

0  all  variables  are  returned 

1  only  new  variables  are  returned 

2  all  variables  requested  by  var_Jlags  are  returned 

3  only  requested  variables  that  are  new  are  returned 

4  block  until  requested  variables  arrive 

New  variables  are  defined  as  variables  that  the  process  has  not  yet  read.  Upon  return,  the 
argument  var _flags  is  updated  to  indicate  which  variables  have  been  read  and  the  data  is 
placed  in  the  location  data.  The  return  value  bytesrcvd  is  the  number  ot  bytes  received  if  the 
call  was  successful;  -1  otherwise.  Mode  4  does  not  include  a  timeout  parameter,  and  therefore 
if  the  processes  are  not  synchronized  properly,  this  mode  can  block  forever. 
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4.11  Timing  Considerations 

The  broadcast  system  is  sensitive  to  timing.  As  discussed  in  Section  4,  it  was  created  pri¬ 
marily  for  applications  where  interprocessor  communication  is  infrequent.  Descriptions  of  the 
timing  problems  and  their  current  solutions  follow.  The  next  generation  of  distributed  utilities 
is  intended  to  alleviate  these  problems. 

The  interactions  of  the  group  commands,  creation,  joining,  resigning  and  removal,  can 
cause  one  process  to  try  to  join  a  non-existent  group  or  to  try  to  remove  an  active  group.  The 
bc_join_grp  function  has  an  error  return  that  must  be  checked  by  the  user  program  to  see  if 
the  join  was  successful;  if  not,  it  should  be  tried  again.  The  bc_remove_grp  function  blocks 
until  the  group  is  inactive.  These  have  proved  to  be  adequate  solutions  to  these  problems. 

There  can  be  problems  synchronizing  the  creation  and  initialization  of  data  in  a  broad¬ 
cast  group  with  requests  to  read  the  data.  Typically  one  process  creates  and  initializes  the 
group.  If  other  processes  don’t  wait  until  this  step  is  complete, they  may  request  data  and 
receive  uninitialized  values.  This  can  be  avoided  by  reading  data  in  mode  4  the  first  time,  so 
the  process  blocks  until  there  is  actually  data  available.  This  has  proven  an  adequate  solution 
to  this  problem. 

In  order  to  deal  with  the  problems  of  non-atomic  data  mentioned  in  Section  4.8,  the 
bc_server  may  take  somewhat  drastic  action.  As  explained  in  Section  4.8,  data  broadcast  by 
members  of  the  group  may  or  may  not  be  received  and  recorded  by  one  or  more  bc_servers. 
This  action  is  a  local  one,  taken  by  each  bc_server ,  yet  the  bc_send  is  a  global  operation.  The 
result  is  that  different  computers  participating  in  a  distributed  computation  may  have  different 
values  of  the  broadcast  variables,  even  if  no  messages  remain  to  be  read.  Therefore,  the 
current  implementation  of  broadcast  variables  is  best  suited  to  the  type  of  coarse  grain  paral¬ 
lel  algorithms  discussed  at  the  end  of  Section  2. 


5.  Experience  Using  DPUP 


Several  parallel  computation  projects  at  the  University  of  Colorado  have  successfully 
implemented  and  tested  distributed  concurrent  applications  algorithms  built  upon  DPUP. 
Almost  all  these  algorithms  use  the  point  to  point  facilities  described  in  Section  3,  because  they 
were  available  to  users  earlier  than  the  broadcast  facilities.  This  section  briefly  summarizes 
some  of  this  research. 

The  predominant  use  of  DPUP  has  been  in  the  development  of  parallel  algorithms  for 
problems  from  optimization  and  V LSI  design.  Many  problems  in  these  fields  seem  to  be  amen¬ 
able  to  solution  by  coarse  grain  parallel  algorithms  that  require  little  shared  data  or  interpro¬ 
cess  communication.  Thus  they  appear  to  be  good  candidates  for  efficient  parallel  implementa¬ 
tion  on  a  network  of  computers. 

Byrd  et  al  [1986]  have  used  DPUP  to  construct  several  concurrent  global  optimization 
algorithms.  The  global  optimization  problem  is  to  find  the  lowest  minimum  ol  a  function  of 
real  variables  that  may  have  multiple  local  minima,  i.e.  lowest  points  in  some  region  of  the 
variable  space.  This  problem  is  difficult  and  expensive  to  solve  and  thus  parallel  algorithms 
are  of  interest.  Byrd  et  al  propose  a  synchronous  stochastic  method  that  has  three  main  parts: 
a  Monte  Carlo  search  of  the  variable  space,  a  phase  that  essentially  performs  nearest  neighbor 
calculations,  and  a  phase  where  multiple  local  minimizations  are  conducted  concurrently.  The 
parallelism  is  at  a  very  high  level  and  relatively  lew  messages  between  processes  are  required. 
Several  methods  of  this  type  have  been  implemented  using  DPUP  and  tested  on  networks  ol  4 
and  8  Sun-3  workstations,  with  encouraging  results.  Speedups  are  often  80%  ol  optimal  or 
higher  on  problems  where  function  evaluation  is  expensive. 

Trienekens  [1986]  used  DPUP  to  solve  two  discrete  optimization  problems,  the  knapsack 
problem  and  the  traveling  salesman  problem,  on  a  network  ol  computers.  These  problems  fre¬ 
quently  are  solved  by  branch  and  bound  methods  which  dynamically  construct  a  tree  of  simpler 
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subproblems,  from  whose  solutions  the  solution  of  the  original  problem  is  obtained.  The  paral¬ 
lel  methods  use  the  master  process  to  generate  the  tree  of  subproblems  and  monitor  progress, 
and  distribute  the  solution  of  the  subproblems  to  the  various  computers  on  the  network.  On  a 
75  city  traveling  salesman  problem,  an  implementation  of  the  parallel  algorithm  using  DPUP 
ran  over  4.6  times  as  fast  on  5  Pyramid  P90-X  computers  as  the  same  algorithm  on  one 
Pyramid. 

Several  projects  in  parallel  algorithms  for  VLSI  design  problems  have  been  based  upon 
DPUP.  Moceyunas  [1986]  has  used  DPUP  to  construct  parallel  simulated  annealing  algorithms 
for  the  optimal  placement  of  a  VLSI  chip.  This  work  has  shown  that  simulated  annealing  can, 
through  several  partitioning  strategies,  gain  performance  advantages  from  parallel  solution  on 
a  local  area  network  of  computers.  Work  on  parallel  algorithms  for  Boolean  function  minimi¬ 
zation  currently  is  underway.  This  work  is  aided  by  a  system  built  on  top  ol  DPUP  by  Mueller 
[1986]  that  allows  the  network  to  simulate  various  interconnection  topologies. 

DPUP  has  been  used  to  implement  several  simple  iterative  methods  for  solving  systems  of 
linear  equations.  These  include  the  chaotic  relaxation  algorithm  of  Chazan  and  Miranker 
[1969]  (given  as  the  example  in  Appendix  A  in  two  versions,  one  using  the  point  to  point  facili¬ 
ties  and  the  other  using  the  broadcast  facilities),  and  block  Gauss-Seidel  and  SOR  algorithms 
for  solving  systems  arising  from  elliptic  partial  differential  equations. 

DPUP  also  was  used  to  build  a  tool  for  evaluating  the  speed  and  completeness  of  various 
vendors’  Ethernet  hardware  and  Berkeley  Unix  interprocess  communication  implementations. 
This  benchmarking  tool  was  run  in  both  loopback  mode  and  remote  mode  on  all  machines 
claiming  to  have  Berkeley  Unix  interprocess  communication  at  the  Portland  Usenix  conference 
vendor  exhibit  in  June  1985.  It  discovered  incomplete  and  incorrect  implementations  in  several 
vendor  products.  The  machines  tested  were  Apollo,  Celerity,  Gould,  IS  I,  Masscomp,  MicroVax, 
Pyramid,  Sequent,  Sun,  Symmetric,  and  Vax.  This  use  demonstrates  the  portability  ot  DPUP. 
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Appendix  A.  Examples  using  Point-to-Point  and  Broadcast  Facilities 


This  appendix  contains  two  implementations,  using  D P UP ,  of  a  simple  but  nontrivial  dis¬ 
tributed  concurrent  computation.  The  first  implementation  uses  the  point-to-point  facilities  to 
build  a  master/slave  version  of  the  algorithm.  The  second  implementation  is  a  broadcast  ver¬ 
sion  of  the  same  algorithm. 

The  algorithm  implemented  is  a  "chaotic  relaxation"  method  for  solving  systems  of  n 
linear  equations  in  n  unknowns  Ax—b.  This  method,  first  proposed  by  Chazan  and 
Miranker[l969j,  is  an  asynchronous  version  of  the  Gauss-Seidel  method.  The  Gauss-Seidel 
method  simply  cycles  through  the  n  equations  in  order,  solving  each  equation  to  yield  a  new 
value  for  the  corresponding  variable.  That  is,  when  solving  the  k‘h  equation,  the  current  values 
of  x[i],  are  substituted  into  this  equation  and  a  new  value  of  x[k]  is  obtained.  For  certain 

classes  of  matrices  A,  the  iterates  converge  to  the  solution  of  the  linear  system. 

In  the  chaotic  version  of  the  Gauss-Seidel  method,  the  order  of  processing  the  equations  is 
arbitrary,  and  the  computation  for  processing  the  k'h  equation  may  use  arbitrarily  old  values 
of  the  other  variables.  Chazan  and  Miranker  have  shown  necessary  conditions  for  such  an  algo¬ 
rithm  to  converge  to  the  correct  solution. 

A  simple  parallel  version  of  the  chaotic  relaxation  algorithm  is  obtained  by  creating  one 
process  to  handle  each  equation.  Process  k  repeatedly  solves  the  kth  equation  for  the  k‘h  vari¬ 
able,  using  whatever  values  of  the  other  variables  it  currently  has,  and  then  sends  its  new 
value  of  xikj  to  the  other  processes.  This  is  the  method  implemented  below.  In  the  point-to- 
point  version,  each  process  repeatedly  sends  its  new  value  of  its  variable  to  the  master,  obtains 
the  master’s  latest  values  of  the  other  variables,  and  performs  its  next  iteration.  The  master 
monitors  the  computation  and  decides  when  to  terminate  it.  In  the  broadcast  version,  each 
process  repeatedly  broadcasts  its  new  value  of  its  variable,  obtains  the  values  of  the  other 


variables  from  its  bc_server,  and  performs  its  next  iteration.  An  initiating  process  is  used  to 
start,  monitor,  and  terminate  the  algorithm. 

Each  version  of  the  example  contains  five  procedures.  Main  is  the  driver  which  includes 
the  creation  of  the  remote  processes  and  the  communication  structures.  Proc_ctrier  monitors 
the  computation  and,  in  the  point-to-point  example,  is  the  master  in  the  master/siave  com¬ 
munications  pattern.  Chaos_rmt  is  the  remote  process  that  repeatedly  solves  the  kiK  equation 
for  the  k‘h  variable.  The  input  and  output  procedures,  input_data  and  output_rlts  are  included 
for  completeness;  they  do  not  contain  any  DPUP  function  calls. 


exam  pie.  chaos 


exam  pie.  chaos 


r 

k*  POINT-TO-POINT  VERSION  OF  THE  CHAOTIC  RELAXATION  ALGORITHM 

.*  * 

**  chaos_0.h  :  include  file 

*  * 

**  Description  : 

This  file  contains  the  parameters  used  by  the  master  and 
**  the  remote  processes  .  In  particular  : 

—  the  names  of  the  output  file,  the  remote  process; 

—  the  number  of  different  hosts  on  which  the  remote  processes  will 

*  *  be  started; 

—  the  constants  related  to  the  number  of  equations  in  the  system; 

—  the  value  used  for  the  stopping  condition ; 

*  * 

Timothy  Gardner  —  March  1984. 

**  Carla  Mowers  and  Isabelle  Gerard  —  June  L98S 

7 

^define  OUTPUT_FILE  " /tools /dpup /examples /sfcrm /chaosO_autput" 

/  *  name  of  remote  processes  */ 

^define  PROC_i>fAJV£E  ” /tools  /dpup  /examples  /strm/cbaos_rmt_0” 

/* 

*'*  number  of  different  hosts  on  which  you  want  to  run  the  remote  processes 
**  (  maximum  7  ) 

7 

#define  NUMHOSTS  7 


/*  name  of  the  hosts  *  j 
#  define  HOST_NAMEO  "molsoa" 

^define  HOST_NAMEl  "bass” 

^define  HOST_NAME2  "watneys" 

^define  HOST_NAME3  "beineken" 

#define  HOSTJNAME4  "anchor" 

#define  HOST_NAME5  "guiness” 

^define  HOST_DfAMES  "becks" 


/  *  maximum  number  of  processes  which  can  be  started  *  j 
#define  MA1NPROC  32 


/* 

**  a  solution  has  been  found  when  all  old  x  values  differ  from  the 
*'*  new  x  values  by  no  mare  than  RLT  PRECISION 

7 

#define  RLT_PRECISION  0.0001 


/* 

** 

*  * 

** 

** 

X  it 


X* 


X  x 

X  X 


x  x 


x  * 


POINT-TO-POINT  VERSION  OF  THE  CHAOTIC  RELAXATION  ALGORITHM 
This  is  the  master  process 
Usage:  chaos  data^file 
Description 

The  purpose  of  this  program  ts  to  find  the  result  X  of  the 
equation  AX  —  3,  where  ,4  is  an  n*n  matrix,  and  X  and  B  are  vectors 
of  size  n. 

The  master  process  reads  in  the  values  of  n,  .4,  and  B  from 
the  input  file.  Then  it  creates  n  remote  processes  each  of  which 
is  going  to  solve  an  equation  of  the  form  : 

Alproc_num/[J  Xfproc  num)  —  Bfproc  __numj  f  where  proc^_num  is  the 
number  of  the  current  process  )  . 
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example. chaos 


example. chaos 


Data_file  format: 
n 

All  A12  A13  AU  ....  Ain  Bt 
A21  A 22  A 23  A 24  \2n  B2 


**  Timothy  Gardner  —  April  1984. 

Carla  Mowers  and  Isabelle  Gerard  —  June  1986 

7 

^include  <stdio.h.> 

^include  <sys /time.h> 

^Include  <sys  /types. h> 

^include  <sys  /socket. h> 

^Include  <aetdb.h> 

^include  <n.etiiiet /m..h> 

^include  ’’/toola /dpup / src /dpu.p_user.h” 

^Include  ”  /tools  /dpup  /examples  /strm /ch.aos_Q.h” 

struct  proc 

{ 

int  proc  sock;  /*  remote  process  socket  descriptor  */ 

Int  equ_num;  /*  equation  number  process  is  */ 

struct  rmt  stat  host;  /  *  to  receive  the  name  of  the  host  */ 

}; 

char  *Host_ams[2l  =  NULL};  /*  used  to  receive  the  name  of  the  */ 

/  *  host  on  which  a  process  is  created  * / 

char  i?roc_nm[2j  =  (PRO  C_NAME, NULL};  / *  used  to  specify  the  name  of  the  *j 

/  *  remote  process  to  be  created  * / 

f  *  array  of  hosts  on  which  remote  * / 

/*  processes  can  be  started  */ 

char  *Host_aames[8l  -  {HOST_NAMEO,  HOST_NAMEl,  H0ST_NAME2, 

H0ST_NAME3,  H0STJNAME4,  H0ST_NAME5, 


H0ST_NAME6, 

NULL}; 

int 

Steps; 

/  *  number  of  steps  to  get  the  result  *  j 

struct 

proc 

Proc  _  table  [MAXNPROC]; 

j*  to  keep  information  about  remote  *f 

/*  processes  created  */ 

> 

FILE 

*Ofd; 

/  *  file  descriptor  far  output  file  */ 

mainfargc. 

argv) 

main 

int 

argc; 

char 

/ 

'argvij; 

\ 

double 

data  array [ \L\XNP ROC/ MAXNP ROC  +  2l;  /*  array  to  hold 

7 

/  *  input  data 

7 

double 

x_array[MAXNPROC:; 

/  *  array  to  hold 

7 
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int  aum  equ; 

int  procsiMAXNPROCj; 

register  int  i; 

int  result; 

Steps  =  0; 

/*  open  the  input  file  */ 
if  ((  Ofd  =  fopea(OLrTPUT_FrLE,"w"))  <  0  ) 

printf(’’cannat  open  OUTPUT_FlLE\a’’); 
exit(); 

} 

/*  read  data  from  the  input  file  */ 
input_data(argc,  argv,  &num_equ,  data_array); 

/*  start  with  all  x  values  equal  to  zero  '/ 
for  (i  —  0;  i  <  nuin_equ;  i+-r) 
x_array[i]  =  Q; 

/* 

**  create  one  remote  process  for  each  equation  and  send  the  A  and  b 
**  values  and  the  initial  x  values  to  that  process 

7 


...main 

/'  calculated  x  values*/ 

/  *  number  of  equations  in  system  */ 

/  *  process  numberj  indexed  by  socket  *  j 

/  '  number 


for  (i  =  aum_equ;  i  >  Q;  i - ) 

Host_nms[0j  =  Hast_names[(i—l)  %  i>TUMHOSTS]; 
if  ((Proc_feab!e[i  —  lj.proc_sodc  = 

dp_create_prac(&(Proc_fcable[i  —  ij.host), 

EEost_ams,  Proc_am,  0))<0) 

printf("niaster:  create_proc  failure\nr’); 
dp_status{); 
exitQ; 

the  values  of  A  and  B  to  the  remote  process  *  j 
dp_write(Pr°c_table[i  -  l].proc_sock, 

(char  *)  &aata_array[i  -  l][0j,  (num_equ-f  3)  ^aiseof^double)); 

the  initial  value  of  X  to  the  remote  process  *  j 
dp  write(Proc_table[i  —  l].proc_sock, 

(char  f)  &x_array(0],  num_equ  *aiseaf(dauble)); 

/*  record  socket  descriptor  of  remote  process  */ 
procs(Proc__table{i  —  l|.proc_sock|  =  i  —  L; 

} 

/  *  compute  the  result  X  of  the  equation  AX  =  B  */ 
proc_ctrier(procs,  aum_equ,  x_array); 

/ *  output  the  results  '/ 

output_rlts(nuin_equ,  data__array,  x  _arrav); 


{ 

} 

j  *  send 
result  = 

j*  send 
result  = 
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/'  7 


7 


proc  drier  —  routine  to  handle  the  sending  and  receiving  of  data 
to  and  from  the  remote  processes 


proc_ctrIer(proc5, 

int 

double 

{ 

int 

int 

int 

int 

int 

int 


double 

double 

register 

struct 

timeout,  fr 
timeout,  t' 


_equ,  x_array) 
procsi],  aum_equ; 
x_arrayij; 

sock; 

fds_mask; 

tmp_mask; 

afds; 

proc_aum; 
done  flags; 


result; 

diff; 

int  i; 

timevai  timeout; 

_sec  =  0; 
usee  =  0; 


j*  socket  descripter  */ 

/  *  select  function  masks  * / 

j  *  saves  value  of  select  mask  * / 

/*  number  of  selected  sockets  */ 

/  *  process  number  *  j 


/  *  when  all  bits  are  zero  a  solution  V 

/ x  has  been  found 

/*  this  variable  is  updated  each  time  a  */ 

J*  remote  process  returns  a  new  z  value  *f 
/*  result  returned  by  remote  process  *j 

j*  diff  between  new  and  aid  x  values  *f 


l*  used  by  select  for  *f 
/*  polling  sockets  */ 


f*  clear  flags  —  i.e.  a  cleared  flag  is  a  bit  set  to  1  */ 

j*  a  marked  flag  is  a  bit  set  to  0  */ 


proc_c  trier 


7 


Tor  (i-Q;  i<num_aqu;  i+-f) 

doae_flags  |=  (l  <<  i); 

while  (done  flags  >  0)  /*  a  solution  has  not  yet  been  found  */ 

i 

fds_mask  =  0; 

j*  set  up  file  descriptor  mask  used  by  the  select  function  */ 
for  (proc_num=0;  proc_num  <  num_equ;  proc_num+-r) 

fds_mask  j=  (1  <<  Proc_table[proc_num).proc_sock); 

j*  fds  mask  must  be  restored  after  each  select  call  *j 
/*  because  select  clears  fds_mask  if  no  fd  is  ready  *f 
/*  to  be  read  7 

tmp_mask  =  fds_mask; 

/*  wait  for  remote  processes  to  return  results  *f 
j  *  by  polling  each  socket  until  data  is  present  k  f 

w  hile  ((aids  =  se!ect(_NFILE-L,  &fds_mask,  0,  0,  &timeout))<  =0) 
fds_mask  =  tmp_mask; 


sock  =  0; 

while  (afds) 

{ 

/  *  determine  processes  that  have  returned  results  7 
/  *  by  checking  each  bit  in  the  mask  returned  by  7 
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...proc_ctrler 

/  *  select.  set  bits  mark  ready  to  read  fd  s  */ 

r°r  (ii) 

if  (fds_mask  &  1) 

{ 

fds_mask  =  fds_mask  >>  1; 

break; 

} 

sock-r  +  ; 

fds_mask  =  fds_mask  >>  1; 

} 

proc_aum  =  procsfsockj; 

/  *  read  result  returned  from  remote  process  '/ 

dp  read(Proc_table(proc_auraJ.praG_3cck,  (char  ')  &result,  siaeofl double),  l); 


diff  =  result  —  x_array[proc_numl; 

If  (diff  <  0) 

diff  *=  -1; 

if  (diff  <=  RLT_PRECISION) 

/*  set  the  dane_flag  for  this  process  by  V 

/*  clearing  the  prac_num  th  bit  of  done_flags  *f 

done_flags  (dcme_flags  *  (1  <<  proc_aum)); 

eiae 

/*  clear  the  done_flag  for  this  process  by  */ 

f*  setting  the  prac_num  th  bit  of  done__flags  */ 

dcme_fiags  j—  (l  <<  proc_num); 

/  *  update  the  x  value  array  with  new  x  value  * / 

x_array[prac__aum]  =  result; 

fpriutf(Ofd,  "step  number  :  %2d  ",  Steps); 

fprintf(Ofd,  "  x(%d]  =  %8.8lf\a",  prac_num,  result); 

fflush(Ofd); 

/*  a  solution  has  been  found  when  all  process  have  '/ 

/  *  returned  a  new  x  value  that  is  not  different  by  * / 

/*  more  than  RLT_PRECISION 

if  (done_flags  ==—  0) 

break; 

/*  send  new  x  values  to  the  remote  process  * / 

dp_write(Proc_tablefproc_aum].proc_sock,  (char  *)  x_array,  num_equ  ‘aixeafl double)) 

Steps+-r; 
nfds — ; 
sock-r- 1-; 
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example. chaos 


example. chaos 
...proc_ctrler 


r 

for 

} 


a  solution,  kas  been  found  so  terminate  remote 

(i=0;  i<aum_equ;  i-r-r) 

dp_kilL _ pr°c(&(Proc _ tableu|.  h.°s  t)) ; 


processes 


7 


r  v 
r 

ff  input _data  —  routine  to  fetch  data  from  the  data  file 

7 

inpu.t_data(argc,  ar gr,  num_equ,  data_array) 
int  argc,  'aum  equ; 

double  dafca_array(||\[AXNPROC  +  2]; 

char  *argvi|; 

{ 

register  int  row,  col; 

FILE  'dafca_fp; 

if  (argc  <  2) 

{ 

priatf(”asage:  %s  dafcafiie\a argv[0j); 
exit(); 

} 

if  ((data_fp  =  foDen(argv[l],  ”r”))  ==  NULL) 

{ 

printf('  cannot  open  %s\n  ,  argv[l|); 
exit(); 

} 

if  (fscanf(data_fp,  "%d",  aura_aqu)  <  l) 

{ 

printf(”data  format  error  in  %s\n",  argvflj); 
exit(); 

} 

for  (row=0;  row<  *num_equ;  row++) 

{ 


input__daia 


/ *  place  number  of  variables  and  the  equation  number  in  */ 
j  *  each  array  along  with  the  A  and  b  values  */ 

data__array[row][0]  =  (double)  ,f:aum__equ; 
dafca_array[rawj[lj  =  (double)  row; 

for  (coi  =  2;  col<  7um__equ  -r  3;  col+-r) 

{ 

if  (fscanf(data__fp,  %lf  ,  &data_array(row][col])  <  l) 

{ 

printt(  data  format  error  in  %s\n  ,  argvflj); 
exit(); 


} 


! 


r  7 


/  * 

/ 

output _rlts  —  routine  to  output  the  final  results 

*  /  ~ 

/ 

output_dts(aum  equ,  data  array,  x  array)  Output _ fits 
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exam  pie.  chaos 


...output_rits 


int 

do  able 
double 


aum_equ; 

da  fca_  array!]  [MAXNPROC 
x_array:j; 

register  int  i,  row,  cal; 

fprincf(Ofd,  ”\a\nsquatiaa  aum 
fprintf(Ofd,  ” - 


for  (i — 0;  i  <  aum_equ;  i-i--r) 

fprlatf(Ofd,  ”  %d 

fflush(Ofd); 

fprintf(Ofd,  "\a\a"); 

fprintf(Ofd,  ’’program  statistics/a”}; 
fprintf(Ofd,  ” - - 


fflush(Ofd); 

fpriatf(Ofd, 
fprintf(Ofd, 
fpriutf(Ofd, 
fprintf(Ofd, 
fprintf(Ofd,  ”\a" 

fflush(Ofd); 


'aumber  of  hosts 
'aumber  of  processes 
’total  aumber  of  steps 
’degree  of  precisioa 


x  value\a”}; 


■\a ); 


%  16.  L2lf\ a",  i-fl,  x_array[ij); 


=  %d\a",  3); 

=  %d\a",  num_equ); 

=  %d\n.",  Steps); 

=  %If\  n",RLT_PRE  C  IS  10  N) ; 


£  X 
£  * 

** 
** 
*  * 


POINT-TO-POINT  VERSION  OF  THE  CHAOTIC  RELAXATION  ALGORITHM 
This  is  the  remote  process 

remote  process  to  calculate  a  new  x  value  for  the  equation  sent 
from  the  master  process 

Timothy  Gardner  —  April  1984 

Carla  Mowers  and  Isabelle  Gerard  —  June  1986 


^include  <stdio.h> 

^include  ”  /tools  /dpup /examples  /strm /chaos_0.h" 


main(argc,  argv) 
int  argc; 

ebar  *argv[|; 

{ 

int 

int 

char 

ebar 

double 

double 

double 

register  int 


main 


sock; 
length; 

data_buf(500]; 
x_values[500j; 
sum,  aum_equ,  proc_aum; 
b__value; 

*dbl_ptr,  ,'x_ptr; 

i; 


/*  socket  connection  to  master 
j*  amount  of  data  read 
/  *  buffer  to  store  A  and  b  values  * / 
/*  buffer  to  store  x  values  */ 


7 


/  '  get  socket  descriptor  to  master  process  * / 
sock  =  dp_rmt_secup(&argc,  argv); 

if  (sock  <  0} 

{ 

priattT'remote:  rmt__setup  error/ a’); 
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example. chaos 


exam  pie. chaos 


...main 

dp_3tatus(); 

dp_rmt;_exit(); 

} 

/  “  read  the  values  of  A,  3,  and  X  */ 
if  (dp_read(sock,  daca_buf,  0,  1)  <  0) 

{ 

prin.tf("remote:  read  error  L\ a"); 

dp_status(); 

dD_rrun_exit(); 

} 


dbl_ptr  =  (double  *)  data_buf; 

aum_equ  =  dbl _ ptrfOj; 

prac_aum  =  dbl _ ptr(l{; 

b_value  =  dbl_ptr[((infc)  aum_equ)-r2j; 

/  '  continue  processing  new  z  values  *  j 

j*  when  a  solution  has  been  found  the  master  will  terminate  us  */ 


for  (;;) 
{ 


} 


} 


f*  read  the  value  of  X  */ 

if  (dp_read(sock,  (char  *)  x_values,  0,  l)  <  0) 

{ 

prinfcf(  remote:  read  error  2\n  ); 

dp_3tafcus(); 

dp_rmt_exit(); 

} 

x_ptr  =  (double  *)  x_values; 


aum  =  0; 

for  (i  =  0;  i  <  aum_equ;  i++) 

{ 

if  (  i  !=  proc_aum  } 

{ 

sum  -r=  dbt _ otr *  x _ ptrfi]; 

}  . . 

} 

sum  =  (b_value  —  sum)  /  dbl_ptr(((int)  proc_aum)  -f  2j; 

/*  send  calculated  x  value  back  to  the  master  */ 

if  (dp__write(sock,  (char  *)  fcum,  aixeof(double))  <  aixeof(daubIe) 


{ 


priatf("remote:  dp_write  error\u"); 

dp_statu50; 

dp_rmc__exit(); 


/* 

BROADCAST  VERSION  OF  THE  CHAOTIC  RELAXATION  ALGORITHM 


chaos _Lh  :  include  file 
Description  : 

This  file  contains  the  parameters  used  by  the  master  and 
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example. chaos 


'*  the  remote  processes  .  In  particular  : 

**  —  the  names  of  the  output  file,  the  debbuging  file,  the  remote 

"  process ; 

*  “  —  the  number  of  different  hosts  on  which  the  remote  processes  will 

' be  started; 

—  the  constants  related  to  the  number  of  equations  in  the  system; 
**  —  the  value  used  far  the  stopping  condition, 

*  *  —  the  structure  describing  the  data  of  the  group, 

**  Carla  Mowers  and  Isabelle  Gerard  —  June  1986 

7 

^define  OUTPUT_FILE  ” /tools /dpup /examples /dgrm /chaos L_autput” 

/  '  name  of  remote  processes  *  j 

^define  PRQC_NAME  "/tools  /dpup /examples /dgrin/chaos_rint_l" 

r 

**  number  of  different  hosts  on  which  you  want  to  run  the  remote  processes 

**  (  maximum  7  ) 

7 

*  define  NUMHOSTS  7 


/  *  name  of  the  hosts  *  j 

f  define  HOST__NAMEO  "molson” 

f  define  HOST_NAMEl  "bass" 

^define  HOST_NAME2  "gniness" 

^define  HOST_NAME3  "anchor" 

^define  HOST__NAME4  "heinekea” 

^define  HOST_NAME5  "becks" 

#defime  HOST_NAME8  "watneys" 


j*  maximum  number  af  processes  which  can  be  started  *f 

#  define  MAXNPROC  32 

/  *  size  of  the  buffers  to  send  and  receive  data  k  f 

#define  '  BUPSIZE  3iseof(double)  *  MAXNPROC  *  MAXNPROC  4- 

(2  *  ai»eof(daubIe)  4-  aIseof(shorfc))  *  MAXNPROC 


/* 

**  number  af  variables  in  the  group  */ 

**  MAXNPROC  for  each  line  of  A,  B,  X,  and  done_flag 

7 


f  define  NUMVARS  4  *  MAXNPROC 

/  * 

/ 

f*  a  solution  has  been  found  when  all  old  x  values  differ  from  the 
**  new  x  values  by  no  more  than  RLT ^PRECISION 

7 


#define  RLT_PRECISION  0.0001 

/  *  structure  describing  the  data  of  the  group  *  j 


struct  chaos 


double 
dou  ble 
double 
short 

}; 


AfMAXNP  R  0  C 1  [MAXNP  R  0  C  j ; 

BfMAXNPROCl; 

XfMAXNPROC!; 

do  ne  _  f  l  a  g  [MAXNPR  0  C  j ; 


/  *  matrix  A 
j*  vector  B 

j *  solution  af  the  equation  kj 
/  *  flag  to  tell  a  remote  is  *  j 
/ *  done 


7 

7 

7 
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example. chaos 


/; 


BROADCAST  VERSION  OF  THE  CHAOTIC  RELAXATION  ALGORITHM 
This  is  the  master  process 
Usage:  chaos  data_file 
Description  : 

The  purpose  of  this  program  is  to  find  the  result  X  of  the 
equation  AX  =  B,  where  A  is  an  n*n  matrix ,  and  X  and  B  are  vectors 
of  size  n. 

The  master  process  reads  in  the  values  of  n,  A,  B  and  the 
initial  values  of  X,  from  the  input  file.  Then  it  creates  n  remote 
processes  each  of  which  is  going  to  solve  an  equation  of  the  form 
Afproc  _numjfj  Xfproc _numj  =  Bfproc_numj  (  where  proc _num  is  the 
number  of  the  current  process  )  . 

Input _file  format: 


#inciude  <3tdio.h.> 

^include  <signal.li> 

^Include  /tools /dpup  /src  /bdct.b." 

-^include  "  /tools  /dpup  /src  /dpup_user.h.” 
^Include  ”  /tools  /dpup  /examples  /dgrm  /chaos  _l.li’ 


char 

*Host  ams(2] 

=  NULL}; 

r 

to  receive  the  name  of  the  host 

7 

r 

on  which  a  process  is  created 

7 

char 

*Proc  am[2] 

=  (PRO  C_NAME, NULL}; 

i* 

used  to  specify  the  name  of  the 

7 

r 

remote  process  to  be  created 

7 

/*  array  of  hosts  on  which  remote  * j 
/  *  processes  can  be  started  k  j 

char  *Host_uames[S]  =  (HOST_NAMEO,  HOST _NAMEl,  HOST_NAME2, 

HO  ST_  NAMES,  HOST_NAME4,  HO  ST  _  NAME  5, 
HOST_NAME6,  NULL}; 

Infc  Steps;  /  *  number  of  steps  to  get  the  result  * / 

FILE  *Ofd;  /*  file  descriptor  for  output  file  '/ 

mam(argc,argv) 

Infc  argc; 

char  ''argvjj; 


main 
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exam  pie.  chaos 


example. chaos 


{ 

struct  chaos  'c_ptr; 

int  proc_30ck[MAXNPROC]; 

char  data;  BUT  SIZE|; 

char  proc_aum[MAXNPROCi; 

short  var_Cabla!^T'fV.\RSi; 

char  var  _  fl  ags  [NLT'fV  ARS  j ; 

int  num_equ; 

struct  rmt_3tan  host; 

int  result; 

struct  bd_id  id; 

short  id_index; 

char  3tr_aum_equ[l0]; 

register  short  L , j ; 

c_ptr  =  (struct  chaos  ')  data; 
Steps  =  0; 


...main 


/  *  pointer  to  the  structure  descri—  */ 

/  *  bing  the  data  of  the  group  * / 

/  '  array  of  the  socket  to  the  remote  * f 

/  *  processes  V 

/*  buffer  to  send/receive  data  */ 

j  '  number  of  process  to  be  created  */ 

/  1  array  containing  the  sizes  of  the  * / 

/  *  data  of  the  group  V 

/  ‘  flags  to  tell  which  data  to  send  * / 

/  *  or  receive  */ 

/  '  number  of  equations  * / 

j  *  to  receive  the  name  of  the  host  on  */ 

/  *  which  a  remote  process  is  created  ‘  j 

j *  number  of  bytes  sent  j  received  */ 

/  *  id  of  the  broadcast  group  * / 

j  *  index  of  the  current  process  in  * / 

f*  the  broadcast  group  V 

f  *  to  pass  the  number  of  equations  to  *j 
/  *  the  remote  processes  * / 


j*  open  the  output  file  */ 

if((Ofd  =  fopeu(OUTPUT_FILE,"w'’))  <  0) 

perror(  fopen  ); 

bc_close(); 

exit(l); 

} 

/  *  initialization  of  the  data  of  the  group  *  j 
input_data(argc,argv,  &aum_equ,  c_ptr); 

/  *  set  the  var__flags  corresponding  to  A,B  and  X,  to  1  */ 

For  (i=0;  i  <  NUMVAJRS;  i++) 

{ 

var_flagsii]  ==  0; 

} 

For  (  i  =  0;  i  <  Tum_equ  ;  i-f-r  } 

f*  each  of  these  flags  corresponds  to  a  line  of  the  matrix  A  *j 

{ 

var_flags[i]  —  1; 

For  (  i  =  MAXNPROC;  i  <  MAXNPROC  +  *nuni_equ;  i  +  ~  ) 

/ *  each  of  these  flags  corresponds  to  an  element  of  the  vector  B  '/ 

{ 

var_flags[i]  =>  1; 

i 

for  (  i  =  2  *  MAXNPROC;  i  <  2  ‘  MAXNPROC  +  *aum_equ;  i++  ) 

j*  each  of  these  flags  corresponds  to  an  element  of  the  vector  X  */ 

var__flags[i|  =  l; 


/*  initialize  the  var__table  in  which  the  sizes  of  the  variables  *j 
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exam  pie. chaos 


/  *  are  given 

for  (  i  =  0;  i  <  MAXXPROC  ;  i-r-r  } 

/  *  corresponds  to  a  line  of  the  matrix  A  '/ 

{ 

var  tabled!  =  al*eof(doublcj  *  .VLAJOfPROC; 

} 

for  (  i  =  \IAXXPROC;  i  <  2  '  MAXNPROC;  +  ) 

/  *  corresnonds  to  an  element  of  the  vector  B  ’ / 

{  ' 

var_fcable[ij  =  3iseof(dauble}; 

} 

for  (  i  -  2  *  MAXNPROC;  i  <  3  *  MAXNPROC,  ) 
j*  corresponds  to  an  element  of  the  vector  X  *j 
{ 

var_table[i]  =  si*eof(double); 

for  (  i  =  3  *  MAJCNTROC;  i  <  4  *  .VL4XNTROC;  i++  ) 

/*  corresponds  to  a  stop  flag  for  each  of  the  remote  processes  */ 

{ 

var_table!i]  =  sixeofphort); 


/  *  establish  a  connection  with  the  local  broadcast  server  *( 
result  =  bc_open(); 
if  (result  <  0) 

{ 

printf('’bc_open.  error\a"); 

dp_3tatus(); 

exit(l); 

} 

j*  set  up  a  new  broadcast  group  *j 

result  —  bc_create__grp(&id,  var_  table,  NUMVARS); 

if  (result  <  a) 

{ 

printf("bc_create_grp  error\n”); 

dp_status(); 

exit(l); 

} 


/  *  join  the  broadcast  group  * f 
id_index  =  bc_jom_grp(&id); 


if  (id_index  <  0) 

{ 

prin.tf("bc_Join_grp  error\u"); 

dp_status(); 

exit(l); 

} 

/  *  create  the  remote  processes  * / 
spriutf(str_uum_equ,  "%d",  aum_equ); 
forfi  =  aum__equ;  i  >  0;  i — ) 

{ 


Host_ams(0|  =  Rost_aames((i-l)  %  NTJMHOSTS); 
spriatf(proc_uum,”%d”,i-L); 

IF  ((proc_sock|i  — t]  =  dp__create_proc(&host,  Hosd_ams, 
Proc__am,  str  aum_equ,  proc__aura,  0) ) < 0) 


fprintffstderr.  '’dp_creace_proc  failure\a“); 

dp_status(); 

exit(  L); 

} 

result  =  dp_write(proc  sock[i  —  ll,  (char  ')  &id,  ai*eo  ffid)); 


...mam 

7 
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example. chaos 


...main 


if  (result  !=  si*eof(id)) 

{ 

printf("  write  error\a  ); 
printff’result  =  %d\a",  result); 
dp_status(); 
exit(l); 

} 

} 

/  *  broadcast  the  data  of  the  group  " f 
result  =  bc_send(id_tiidex,  v-ar_flags,  data); 

/*  compute  the  result  X  of  the  equation  AX  —  3  * / 
proc_ctrler(num_equ,  c_ptr,  data,  var_flags,  id_index); 

/*  output  the  results  */ 
output_rlts(aum_equ,  c_ptr); 

[‘  all  the  processes  are  terminated  so  terminate  the  master  */ 

/  *  resign  from  the  broadcast  group  *  j 
result  =  bc_resigu_grp(id_iadex); 

printf("master:resulfc  of  bc_resigii_grp  =  %d\a”,  result); 

/*  remove  the  broadcast  group  *j 
result  =  bc_remove_grp(&id); 

printf(”master:result  of  bc_remore_grp  =  %d\n”,  result); 

j  *  terminate  connection  with  the  broadcast  server  */ 
result  —  bc_close(); 

printf("master:result  of  bc_close  —  %d\n”,  result); 

} 


/* 

** 

** 

*  / 


prac_ctrler  :  procedure  to  exchange  data  with  the  remote 
a  solution  to  the  equation  AX  =  b  is 


processes 

found 


until 


proc_ctrler(uum_equ,  c_ptr,  data, 
int  aum_equ; 

struct  chaos  *c_ptr; 

char  dataiBUFSIZEj; 

char  var_flags[NTJMVARSj; 

int  id_iudex; 

{ 

int  prac_Qum; 

int  doue_fIags; 

int  result; 

int  i; 


var_flags,  id_index) 


/  *  process  number 

/  *  when  equal  to  zero  a  solution  has 
j  *  been  found 

j  *  number  of  bytes  received / sent 


proc_ctrler 


7 


7 


doue_flags  =  L; 

while  (done  flags  >0)  /*  a  solution  has  not  yet  been  found  */ 

i  ~ 

\ 

f  ‘  set  var  flags  to  receive  X  and  the  done  _flags  *  j 
for  (i  =  0;  i  <  NUMVAHS;  i-t--) 

{ 

var_flagsii|  =  0; 
t 
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...proc_ctrler 

for  (i  =  0;  i  <  ruim_equ;  i-r-r) 

{ 

If  (  c_ptr— >doae_flag[ij  =  =  0  ) 

{  r 

var_tlagsi2  '  \LAXNPR0C  +  i|  =  L; 

var_tlagsi3  '  MAXNPROC  +  ij  =  l; 

} 

} 

/  '  wait  for  remote  processes  to  return  results  */ 

result  =  be  receive(id  iadex,  (short)  BLOCK_ON_DATA,  data,  var_flags); 

/  *  output  the  new  values  of  X  */ 
for  (L  =  0;  i  <  aum_equ;  i-b-r) 

{ 

Steps— -b; 

fpriatffOfd,  "step  aumber  :  %2d  ",  Steps); 

fpriatf(Ofd,  ”  x\%d\  —  /S8.Slf\a",  i,  c_pfcr— >X[i]); 

fflusa(Ofd); 

} 


/  *  determine  if  all  processes  are  done  */ 
daae_flags  =  0; 

for  (i  =  0;  i  <  aum_equ;  L-b-b) 

if  (c_ptr->doae_flag[ij  ==  Q) 

/  *  the  remote  process  i  is  not  done  * / 

{ 

daae_flags  =  1; 

break; 

} 

}  l  *  end  of  for  * / 

}  /*  end  of  while  * / 


/  *'  end  of  prac_ctrler  *  j 

/* 

**  tnout_data  :  routine  to  read  data  from  the  file  passed  in 

7 


iapu.t_data(argc,  argv,  aum_equ,c_pfcr) 


int 

char 

infc 

struct  chaos 


arge; 

*axgvQ; 

*aura_equ; 

*c_ptr; 


input _d  at  a 


register  int  i,j; 

FILE  "data_fp; 

if  (arge  <  2) 

{ 

priatf(  usage:  %s  datafiie\a  ,  argvfoj); 
exit(); 

} 

/  *  open  the  input  file  * / 

if  ((aata_fp  =  fopea(argril],  "r"))  ==  NULL) 

{ 

priatf(  canaot  opea  %s\a  ,  argvill); 
exitQ; 
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example. chaos 
...input_data 


} 

/ '  read  in  the  number  of  equations  */ 
fscaaf(data_fp,  "%d  ",  num_equ); 


/*  read  in  the  matrix  A  */ 

for  (  i=G  ;  i  <  *num_equ  ;  i —  ) 

{ 

for  (  j=Q  ;  j  <  *num_equ  ;  j  -r  —  ) 

{ 

fscanf(data_fp,  ”%lf  ,&(c_ptr->Afij[jj)); 

} 

} 

/  *  read  in  the  vector  B  * / 

for  (  i  =  0  ;  l  <  <aum_equ  ;  i-f-r  ) 

{ 

fscanf(data_fp,  "%lf",&(c_ptr->B[i|)); 

} 

/*  read  in  the  initial  value  of  the  vector  X  *j 

for  (  i  —  0  ;  i  <  *num_equ  ;  i++  ) 

{ 

fscanf(data_fp,  '%\i  ',&(c_ptr->X[ii)); 

} 

/  *  initialize  the  done  _flags  *  j 

for  (  i  =  0  ;  L  <  |tnum__equ  ;  i—-r  ) 

{ 

c_ptr— >done_flag[i]  =  0; 

} 


} 

/* 

**  output_rlts  :  procedure  to  output  the  results 

7 

oufcput_rlts(num_equ,  c_ptr) 
int  num_equ; 

struct  chaos  *c_otr; 

{ 

register  int  i,  row,  col; 


fprintffOfd,  ”\  adequation  aum 
fprintffOfd,  ” - 

for  (i — 0;  i  <  aum_equ;  i+— ) 

fprintffOfd,  ”  %d 


fprintffOfd,  "\a\n”); 
fflush(Ofd); 


fprintffOfd,  "program  5tatistics\a”).; 
fprintffOfd,  ” - - \n\n"); 


fpriatf(Ofd, 
fprintffOfd, 
fprintffOfd, 
fprintffOfd, 
fprintffOfd, 
fflushf  Ofa); 


"aumber  of  hosts 
"number  of  processes 
"total  number  of  steps 
"degree  of  precision 

’V”); 


output_rlts 


x  value\n"); 
- \n\n"); 


%16.12lf\n",  i+1,  c_ptr->X[i|); 


=  %d\n",  NUMHOSTS); 

—  ’%d\n",  aum__equ); 

=  %d\n",  Steps); 

=  %lf\n",RLT_PRECISION); 
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r 


v 


BROADCAST  VERSION'  OF  THE  CHAOTIC  RELAXATION  ALGORITHM 

this  is  the  remote  process 

remote  process  to  calculate  a  new  z  value  far  the  equation  sent 
from  the  master  process 

Carla  Mowers  and  Isabelle  Gerard  -  June  1986 


^include 
^Include 
#  include 
ff  Inel  ude 
^Include 


<3tdio.h.> 

<signal.h.> 

” /tools  /dpup  /src  /bdct.h." 

’’/tools  /dpup  /src /dpup_user.h" 

”  /tools  /dpup  /examples  /dgrm /chaos _l.h" 


•^define  TIMES  2 


extern  int  errno; 


main(argc,  argr) 
int  argc; 
char  **ar  gr; 


main 


struct  chaos  *c  ptr; 

/*  pointer  to  the  structure 

7 

/  *  containing  the  data  of  the  group 

7 

char 

dafca[BUFSIZE|; 

/*  buffer  to  send / receive  data  of 

7 

/  *  the  group 

7 

char 

rar  _flags  [INTJMVAilS  ] ; 

/  *  flags  used  to  tell  which  data 

7 

j*  are  going  to  be  sent / received 

7 

Int 

sack; 

j*  socket  to  the  master  process 

7 

Int 

result; 

/*  returned  number  of  bytes  sent  or 

*  / 

/ 

j*  received 

7 

Int 

a  am  equ; 

j*  number  of  equations 

7 

struct 

bd  id  id; 

/*  id  of  the  broadcast  group 

7 

short 

id  index; 

j*  index  of  the  current  process  in 

7 

f*  the  broadcast  group 

7 

FILE 

*dfd; 

j*  file  descriptor  far  debbuginq 

7 

j*  file 

7 

char 

dbg _ file(50]; 

l*  name  of  the  debbuging  file 

7 

short 

done  flag  =  0; 

j*  flag  set  to  tell  local  process 

/ 

/*  is  done  or  not 

7 

Int 

proc  num; 

j*  number  of  the  current  process 

7 

double 

sum; 

j*  to  compute  the  new  value  of 

7 

/*  Xfproc  num/' 

7 

double 

diff; 

/*  difference  between  new  and 

7 

/*  previous  value  of  Xfproc  num J 

7 

static 

int  counter  =  0; 

/ *  consecutive  times  where  the 

7 

j*  difference  between  the  new  and 

7 

j *  the  previous  value  of  Xfproc  numj  *  j 

j*  was  less  than  RLT  PRECISION 

7 

register 

short  i,j; 

7  set 

up  communication 

channel 

with 

the  master  process  * / 

sock  = 

dp  rmt  setup  (& 

argc,  argr); 

if  (sock 
/ 

<  o) 

i 

printff'remote: 

dp  rmt 

setup 

error\n"); 

dp  stacusQ; 

dp  rmt  exit(); 

} 
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...mam 


c  ptr  =  (struct  chaos  *)  data; 


**  initialize  the  number  of  equations  passed  as  an  argument  by  the 
' '  master 

7 

aum_equ  =  atoi(argv[lj); 

/*.... 

**  initialize  number  of  the  current  process  passed  as  an  argument 
“  by  the  master  process 

7 

proc_aum  ==  atai(argv!2j); 


/*  read  the  id  of  the  broadcast  group  from  the  master  process  */ 
result  =  dp_read(sock:,  (char  x)  &id,  ssseof^id),  l); 

if  (result  !=  si*eof(id)) 

priii tf(”remo fee:  read  error:  errno  =  %d\n  ,  errno); 
dp_rint_exit(); 

} 

j  *  establish  a  connection  with  the  local  broadcast  server  *f 
result  =  bc_apen.(); 

if  (result  <  0) 

priatf(’’remote:  bc_apea  errar\n"); 
result  =  bc_close(); 

printf(”slave(%d]:  bc_clase  =  %d\a”,proc_iiuin, result); 

dp_status(); 

dp_rm.t_exit(); 

} 

/*  join  the  broadcast  group  */ 
id_index  -=  bc_joia_grp(&id); 

if  (id_index  <  0) 

{ 

prin.tf(”remate:  bc_joia_grp  error\n”); 

dp_status(); 

result  =  bc_close(); 

priatf(”slave(%d):  bc_close  =  %d\u",proc_uum, result); 
dp_rmt_exifc(); 

} 

r 

**  initialize  the  var_flags  to  receive  A[proc_numj[J,B[proc_numj 
**  and  X 

7 

far  (i=0;  1  <  NUMVARS;  i++) 

/ 

\ 

var_flags|ii  =  0; 

} 

var_flags(proc_aum.'i  =  L; 
var_flags[N’L\XNPROC  -  proc  _aumj  =  1; 

/  f  receive  the  initial  values  of  X  * / 
forfj  =  0;  j  <  aum_eau;  j-f-r) 

{ 

var _flags(2  *  MAXNPROC  +  jj  =  1; 
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...main 


/*  receive  the  data  */ 

result  =  be  receive(id_index,  (short)  BLOCK_ON_DATA,  data,  var_flags); 


doae_fIag  =  0; 

c_ptr->done_flag[proc_aum.i  =  0; 


while  (  doae_flag  =====  0  ) 

/  *  laoa  until  dane_flag  =====  l 

{ 


j  *  compute  the  new 

sum  =  0.0; 

for  (  j  =  0  ;  j  < 

{ 

if  (  j  != 

{ 


} 


(  stopping  condition  is  reached  )  */ 
value  of  Xfproc _num/  :  sum  */ 
num_equ  ;  jAA  ) 
proc_aum  ) 

sum  A=  c_ptr— >A[proc_uumj[j]  * 

c_ptr  — >X;j  j; 


sum  =  (c_ptr— >B[proc_num]  —  sum)  /  c_ptr— >A[proc_aum][proc_aumj; 


/* 

**  check  the  stopping  condition  :  it  is  reached,  if  the 
**  difference  between  new  and  previous  value  of  Xfproc  __numf 
**  is  less  or  equal  to  RLT  ^PRECISION  more  than  TIMES 
**  consecutive  times 

7 

diff  =  sum  —  c_ptr— >X(proc_aumj; 


if  (diff  <  0) 

{ 

diff  *=  -1; 

} 


xf  (diff  <=  RLT_PRECISION) 

{ 

caunterAA; 

if  (  couuter  =====  TIMES  ) 

/  *  stopping  condition  reached  *  j 

{ 

doae_flag  =  1; 

c_pfcr—>done_fIag[proc_aum]  =  1; 

l 

} 

else 


{ 

} 


counter  =  0; 


/*  update  the  Xfproc__numj  with  the  new  value  */ 
c_ptr->X(prac_aumj  =  sum; 


/*  set  the  var  flags  to  send  Xfproc__numj  and  the  done— flag 
for  (i=Q;  i  <  NXJjVTVARS;  H — r) 

{ 

var_flags!i]  =  0; 

} 

var_flags[2  '  MLAXNPROC  A  proc__aumi  ==  L; 
var_flags|3  *  MAXNPROC  A  proc_numj  =  L; 

/  *  send  Xfproc  numj  and  the  done _flag  *  j 
result  =  bc_send(id__index,  var _ flags,  data); 


if  (  done__flag  =====  L) 


7 
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...main 


} 


/  '  the  process  is  done  ‘ / 

I*  set  the  var__flags  to  send  X[proc_ntim[  and  *[ 

/  *  the  done _flag  V 

far  (i  — 0;  i  <  NTJMTV AR5 ,  i -i — r-) 

{ 

var_f!agsiii  =  0; 

} 

var_flag3:2  *  VtAXNPROC  -r  proc_aumj  =  1; 
var~flag3i’3  *  MAXXPROC  -r  proc_aumj  =  l; 

for  (  i  =  0;  i  <  2;  i-~  ) 

/  '  send  again  in  case  the  packet  was  discarded  ' / 

/*  by  the  local  server  of  the  master  V 

/  '  send  Xfproc  numj  and  done  _flaq[proc  _numj  *  j 
result  =  bc_3ead(id_index,  var_flags,  data); 

} 

brea  k; 

} 


else 

/  *  the  process  is  not  done  * / 


initialize  the  var  flags  to  receive  the  new  values 
**  of  X  (  except  Xfproc _numf  ) 


7 

for  (i=0;  i  <  NUMVikRS ;  i+-h) 
var_flags[i]  =  0; 

} 

for(j  =  0;  j  <  aum_equ;  j+-f) 

if  (  j  !=  proc_aum  ) 

( 

rar_flags[2 


} 


} 


*  MAXNPROC 


+ 


/*  receive  the  data  */ 

result  =  bc_receive(id_index,  (short)  ALL_REQ_DATA,  data,  var_flags); 


} 

j*  end  of  while  * / 


/*  resign  from  the  broadcast  group  rj 
result  =  bc_resigu_grp(id_iadex); 

printf(”slave[%d]  :  result  of  bc_resign._grp  =  %d\a\ 
proc^aum,  result); 

/*  terminate  connection  with  the  broadcast  server  */ 
result  =  bc_close(); 

priacf(”slave(%d!  :  result  of  bc_close  =  ?5d\a  , 
proc  _au'm. result); 

dp  _rmt_exit(); 
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